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One  of  the  most  widely  used  methodologies  in  scientific  and  engineering  research  is  the 
fitting  of  equations  to  data  by  least  squares.  In  cases  where  significant  observation  errors 
exist  in  all  data  (independent)  variables,  however,  the  ordinary  least  squares  approach, 
where  all  errors  are  attributed  to  the  observation  (dependent)  variable,  is  often  inappro¬ 
priate.  An  alternate  approach,  suggested  by  several  researchers,  involves  minimizing  the 
sum  of  squared  orthogonal  distances  between  each  data  point  and  the  curve  described  by 
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the  model  equation.  We  refer  to  this  as  orthogonal  distance  regression  (ODR).  This  paper 
describes  a  method  for  solving  the  orthogonal  distance  regression  problem  that  is  a  direct 
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volved  is  the  number  of  model  parameters  plus  the  number  of  data  points,  often  a  very 
large  number.  By  exploiting  sparsity,  however,  our  algorithm  has  a  computational  effort 
per  step  which  is  of  the  same  order  as  required  for  the  Levenberg-Marquardt  method  for 
ordinary  least  squares.  We  prove  our  algorithm  to  be  globally  and  locally  convergent,  and 
perform  computational  tests  that  illustrate  some  differences  between  the  two  approaches. 
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Abstract 

)  One  of  the  most  widely  used  methodol(^ie8  in  scientific  and  engineering  research  is  the 
fitting  of  equations  to  data  by  least  squares.  In  cases  where  significant  observation  errors 
exist  in  all  data  (independent)  variables,  however,  the  ordinary  least  squares  approach, 
where  all  errors  are  attributed  to  the  observation  (dependent)  variable,  is  often  inappro¬ 
priate.  An  alternate  approach,  suggested  by  several  researchers,  involves  minimizing  the 
sum  of  squared  <»thogonal  distances  between  each  data  point  and  the  curve  described  by 
the  model  equation.  We  refer  to  this  as  orthogonal  distance  regression  (ODR).  This  paper 
describes  a  method  for  solving  the  orthogonal  distance  regression  problem  that  is  a  direct 
analog  of  the  trust  region  Levenberg-Marquardt  algorithm.  The  number  of  unknowns  in¬ 
volved  is  the  number  of  model  parameters  plus  the  number  of  data  points,  often  a  very 
large  number.  By  exploiting  sparsity,  however,  our  algorithm  has  a  computational  effort 
per  step  which  is  of  the  same  order  as  required  for  the  Levenberg-Marquardt  method  for 
ordinary  least  squares.  We  prove  our  algorithm  to  be  globally  and  locally  convergent,  and 
perform  computational  tests  that  illustrate  some  differences  between  the  two  approaches. 


When  obeervationB  ere  corrupted 
By  errors  random  i.i.d. 

Least  Squares  provides  the  answer 
A  true  blue  friend  it  be. 

But  when  the  independents 
In  addition  come  imclean 
Nonlinearities  enter 
And  friends  desert  the  scene. 

Now  in  your  darkest  hour 
When  hope  is  all  but  gone 
This  paper  comes  to  show  you 
How  estimation  can  go  on. 

What  is  the  shining  hope 
That  beacon  from  afar? 

Is  it  that  new  fotmd  wonder 
That’s  known  as  ODR? 

Yes!  That’s  right!  You’ve  got  it! 
And  we  tell  all  below 
From  problem  through  solution 
With  codes  to  make  it  go. 

So  calm  your  pounding  heart 
And  mop  your  sweating  brow. 
Your  data  can  be  handled 
And  we  will  show  you  how. 


1.  bitroductiom 

The  problem  of  fitting  a  model  to  data  with  errors  in  the  observations  has  a  rich  history 
and  a  considerable  literature.  The  problem  where  there  are  also  errors  in  the  independent 
variables  at  which  these  observations  are  made,  however,  has  only  relatively  recently  been 
given  attention.  In  this  paper,  we  consider  a  general  form  of  this  extended  problem  and 
provide  an  efficient  and  stable  algorithm  for  its  solution.  Several  names  for  this  extended 
problem  have  been  suggested;  we  prefer  orthogonal  distance  regression  (ODR). 

Errors  in  independent  variables  virtually  always  occur,  but  are  often  ignored  in  or¬ 
der  that  classical  or  ordinary  (linear  or  nonlinear)  least  squares  (OLS)  techniques  can 
be  applied  (see,  e.g.,  [LawH74],  [Ste73],  [Mor77],  [DenGWSl]).  Also,  if  these  errors  are 
small  with  respect  to  those  in  the  observed  variables,  then  ignoring  them  does  not  usually 
seriously  degrade  the  accuracy  of  the  estimates.  In  some  fields,  however,  measurement 
techniques  are  sufficiently  acctirate  that  errors  in  the  independent  variables  are  not  in¬ 
significant  compared  to  those  in  the  observations.  Examples  at  the  National  Bureau  of 
Standards  (NBS)  include  the  calibration  of  electronic  devices,  fiow-meters  and  calorime¬ 
ters.  Another  class  of  examples  comes  from  curve  and  surface  fitting  problems. 

We  first  develop  a  formal  statement  of  the  ODR  estimation  problem  and  briefiy  disctiss 
its  application  to  statistical  estimation  and  to  curve  fitting.  The  main  contributions  of 
the  paper  are  the  derivation  and  convergence  analysis  of  a  highly  efficient  algorithm  for 
solving  ODR  problems  (Sections  2  and  3).  In  Section  4,  the  results  of  some  computations 
are  shown  which  illustrate  the  performance  of  the  algorithm  and  allow  some  comparisons 
with  ordinary  least  squares. 

Observations  in  applied  science  are  often  thought  of  as  satisfying  a  mathematical 
model  of  the  form 

(1.1)  y  =  f{x,P) 

where  y  is  taken  to  be  the  “observed'’  value,  or  independent  variable;  and  0  E  R’’*  is  the  set 
of  parameters  to  be  estimated.  The  function  /  is  not  assumed  to  be  linear,  but  is  assumed 
to  be  smooth.  The  data  are  simply  the  pairs  (z^y,),*  =  /,...  ,n.  Typically  the  number  of 
data  points,  n,  is  far  greater  than  the  number  of  parameters,  p. 

In  the  classical  case,  only  the  observations  y,  are  assumed  to  be  contaminated  with 
errors.  If  these  errors  are  additive  and  the  mathematical  model  is  exact  then 

(1.2)  y,  =  /(x,,  4)  +  e,  $  =  1, . . . ,  n 

for  some  correct  value  of  the  puameters  0.  If  in  addition  the  errors  are  normally  distributed 
with  mean  0  and  variance  I,  then  maximum  likelihood  estimate  of  ^  is  the  solution  to 
the  least  squares  problem 

n 

(1.3)  min  [y,  -  f{xi,  0)]^ . 

^  •=i 

If  /  is  a  linear  function  of  P  then  this  is  a  classical  linear  least  squares  problem,  otherwise 
it  is  a  classical  nonlinear  least  squares  problem.  Even  when  the  above  assumptions  on  the 


model  or  the  errors  are  not  satisfied,  problem  (1.3)  is  the  most  frequently  used  method  for 
parameter  estimation. 

In  the  more  general  situation,  the  measurements  of  the  independent  variables  Xi  are 
also  assumed  to  contain  errors.  If  we  assume  that  yi  has  unknown  additive  error  £«  and 
that  Xi  has  unknown  additive  error  d,-,  then  (1.2)  becomes 

(1.4)  Vi  =  /{xi  +  Si;  fi)  +  c,-. 

An  intuitively  reasonable  way  to  select  the  parameters  in  thb  case  is  to  choose  the  0 
that  causes  the  sum  of  the  squares  of  the  orthogonal  distances  from  the  data  points  (zt ,  y,) 
to  the  curve  f{x,0)  to  be  minimized.  (See  Figure  1.)  If  r,-  is  the  orthogonal  distance  from 
{xi,yi)  to  the  curve,  then 

r?  =  *  =  1, . . . ,  n 

where  c,  and  Si  solve 

mm(6?  +  S^) 

(1.5) 

subject  to  f{xi  -f-  Sii  0)  +  c,  =  y,. 

The  constraint  in  (1.5)  ensures  that  the  distance  u  connects  the  point  (z,',yi)  to  the 
curve.  The  minimization  ensures  that  r,-  is  the  radius  of  the  smallest  circle  centered  at 
(z,-,y,-)  which  is  tangent  to  the  curve  f{xi;0).  (See  Figure  2).  Therefore,  the  parameters 
0  that  cause  the  sum  of  the  squares  of  the  orthogonal  distances  from  the  data  points  to 
the  curve  to  be  minimized  axe  found  by  solving 

min  =  min  (e*  +  Sf) 

(1.6) 

subject  to  y,  =  /(x,  +  6,;  0)  -f  c„  i  =  1, . . . ,  n. 

Since  the  constraints  in  (1.6)  are  simple  linear  construnts  in  e,,  we  solve  for  and  eliminate 
both  these  variables  and  all  of  the  constraints  thereby  obtaining 


which  is  now  an  unconstrained  minimization  problem. 

Two  slight  extensions  to  this  form  constitute  the  ultimate  problem  to  be  considered. 
The  first  allows  the  possibility  that  x»  €  R”*  rather  than  R^.  Therefore,  Si  €  R”^  and 
instead  of  S^  in  (1.7)  we  have  S^Si  =  (The  superscript  T  denotes  trsmspose.) 

The  second  extension  merely  admits  a  general  weighting  scheme  on  the  problem.  The  form 
we  have  chosen  results  in  the  general  nonlinear  ODR  problem 


[(/(i.  +  Si;0)  -  y,)^  + 


{ODR) 


where  Wi  >  0,i  =  and  Di  =  diag{dtj  >  0,  j  =  ,  t  =  i.e.,  Di  is 

a  diagonal  matrix  of  order  m.  It  follows  that  the  vectors  y,  w  €  £**  and  x,6  e.  R'*”*  and 

«.« 

While  we  have  not  assumed  that  /  is  linear,  it  is  important  to  note  that  (ODR)  is  a 
nonlinear  optimization  problem  even  if  /  is  the  simple  linear  fimction 

y  =  /3ii  +  ^2 

since  we  then  have  that 

yi  =  01  (x,-  +  5.)  -]-02  + 

Clearly  the  product  of  0i  and  6i  is  an  unavoidable  nonlinearity. 

ODR  problems  have  been  considered  by  statisticiaiu,  usually  under  the  rubric  errors 
in  variables.  Most  of  this  effort,  however,  has  been  devoted  to  linear  models,  i.e.,  when 
/  is  linear  in  0.  (See  e.g.,[Mor71],  [KenSOSS],  [Bar74,p.67]  and  [Ful86].)  As  in  the  classi¬ 
cal  nonlinear  least  squares  case,  little  theory  on  the  statistical  properties  of  the  solution 
appears  to  exist.  It  is  known  that  if  both  e  and  S  are  normally  distributed  with  mean 
zero  and  variances  and  as  I  respectively,  then  the  solution  of  (ORD)  with  tvi  =  1  and 
Di  =  [atlas)li  t  =  1, . . .  ,n  is  a  maximum  likelihood  estimate  of  the  parameters.  Unfor¬ 
tunately,  as  in  the  nonlinear  classical  case,  no  generally  valid,  computationally  efficient, 
inferential  statistical  tests  are  known. 

Independent  of  statistical  considerations,  ODR  has  potentially  significant  applications 
in  curve  and  surface  fitting.  Consider,  for  example,  the  problem  of  finding  the  parabola 
which  best  fits  the  given  set  of  points  (see  Figtire  3.).  (We  have  seen  this  problem  arise  from 
a  dental  application.)  Here  it  is  clear  that  ordinary  least  squares  will  unduly  weight  the 
top  data  points,  while  fitting  in  the  horizontal  direction  would  undully  weight  the  bottom 
data  points.  An  orthogonal  measure  of  distance  alleviates  these  problems  and  provides  a 
reasonable  fit.  A  related  case  is  the  problem  of  fitting  near  an  asymptote  as  illustrated 
in  Figure  4.  Orthogonal  distances  here  prevent  the  undue  infiuence  of  points  close  to  the 
asymptote.  This  problem  is  discussed  further  in  Section  4. 

The  literature  contains  several  algorithms  for  solving  (ODR)  and  related  problems. 
For  example,  Golub  and  Van  Loan  [GolV83]  give  a  singular  value  decomposition  procedure 
for  the  problem  when  /  is  linear.  They  refer  to  this  problem  as  total  least  squares.  Britt 
and  Luecke  [BriL73]  consider  the  nonlinear  case  as  well  as  the  nonlinear  implicit  case  and 
present  an  algorithm.  Recently,  Schwetlick  and  Tiller  [SchT85]  proposed  an  algorithm 
similar  to  the  one  here  for  the  nonlinear  problem.  Our  algorithm,  however,  does  not  make 
use  of  the  singular  value  decomposition  and  it  does  incorporate  a  full  trust  region  strategy. 

3.  The  Algorithm 

In  order  to  solve  the  minimization  problem  (ODR), 

(2.1)  xmn  ^  wj  [(/(x,-  -I-  ;9)  -  y,)^  -t-  ST 

’  •=! 

we  first  express  it  in  a  more  convenient  form  and  simplify  the  notation.  Next,  we  give  an 
overview  of  the  iteration  which  is  based  on  the  trust  region  -Levenberg-Marquardt  strategy 


3 


popularized  by  Mor4  [Mor77].  (See  also  [Heb73],  [DenS83].)  We  then  show  how  to  modify 
this  technique  to  obtain  an  algorithm  which  requires  the  same  order  of  work  per  iteration 
as  these  algorithms  applied  to  the  same  problem  without  allowing  changes  to  z^.  That  is, 
if  the  S^s  are  held  fixed  at  zero,  ODR  reduces  to  OLS  and  trust  region  methods  require 
O(np^)  operations  per  iteration.  Our  algorithm,  by  exploiting  the  structure  of  (ODR), 
still  requires  only  O(np^)  +  0{nm)  operations  per  iteration  to  solve  the  problem. 

While  we  have  designed  and  implemented  the  algorithm  to  handle  the  full  generality 
of  (2.1),  the  notation  is  considerably  simplified  by  assuming  z,-  G  R^.  We  temporarily  make 
this  assumption  and  rewrite  (2.1)  into  the  form  of  an  OLS  problem  by  the  following  device. 
Let 

*  '  '  \  i=n+l . 2n. 

Also  let  G  :  RP+”  —*■  have  component  functions  giiv)  where  q  =  (f).  Now  (ODR) 
becomes 


(2.3)  min||G(f7)l|^  =min5^(p.(/?,«))* 

^  .=1 

which  is  an  OLS  problem  with  (p  +  n)  parameters  and  2n  equations.  (In  all  cases  in  this 
paper,  ||•||  denotes  the  vector  or  matrix  norm.)  Direct  application  of  trust  region  meth¬ 
ods  to  (2.3)  would  require  0(2n(n  +  p)^)  operations  per  iteration  which  rapidly  becomes 
prohibitive  if  n  is  large.  (Recall  that  n  is  usually  far  greater  than  p  in  practice.) 

The  basic  idea  of  a  trust  region  strategy  is  to  choose  as  the  step  that  vector  which  mini¬ 
mizes  a  linear  approximation  to  G  over  a  region  in  which  the  linearization  is  a  “reasonable” 
approximation  to  G  .  Specifically,  if  G'(r;®)  G  R2nx(n+p)  Jacobian  matrix  of  G 

evaluated  at  the  current  iterate,  q®,  then  the  step  z  is  chosen  by  solving 

mm||G(q®)-hGV)^ll=* 

subject  toljZzlj  <  T 

where  Z  is  a  nonsingular  (usually  diagonal)  scaling  matrix  and  r  is  the  trust  region  radius. 
It  is  easy  to  show  that  the  solution  to  (2.4)  is  given  by  the  z(a)  satisfying 

(2.5)  (G'(q®)’’G'(q®)  +  aZ^Z)  z{a)  =  -G'{r,^fG{r,^) 

where  a  >  0  is  the  Lagrange  multiplier  for  the  inequality  constraint.  Note  that  if  |iz(0)||  < 
r,  a  =  0  and  the  constraint  is  inactive.  Otherwise  a  >  0  and  the  constraint  is  active. 
Equation  (2.5)  is  the  famous  Levenberg-Marquardt  formula,  but  this  derivation  has  given 
rise  to  more  stable  and  robust  implementations.  (See,  e.g.,  [Mor77)  and  [DenS83]).  Clearly 
(2.5)  can  be  regarded  as  the“normal  equations”  for  the  extended  least  squares  problem. 
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where  V,  €  The  rest  of  the  development  now  allows  i,-  €  R”^. 

To  derive  an  efficient  procedure  for  solving  (2.7),  we  first  form  the  normal  equations 
associated  with  (2.7): 


(2.8) 


fj^J  +  aS^  J'^V  \ 

VTj  V^V  +  +  aT^  )  \t  )  ~  yV^Gi  +  DG^)' 


Let  P  =  V'^V  +  D^  +  aT^.  Solving  the  bottom  equations  of  (2.8)  for  t  in  terms  of  s  yields 
(2.9)  t  =  -P-^(K’’Gi  +  £>G2  +  F^Js) 


and  then  substituting  (2.9)  into  the  top  equations  of  (2.8)  gives 


‘.W 


(2.10)  [J^(/  -  VP-^V'^)J  +  aS^j  s  =  -j'^Gi  +  (V^Gi  +  DG^). 

Thus  s  is  the  solution  to  the  least  squares  problem. 


(2.11) 


( 


(/- VP-lVr)l/2j\ 
^1/2  )  ^ 


=.( 


(/  -  VP-W^)-^/^  [-Gi  +  VP-\V'^Gi  +  I>G2)]  \ 


The  following  propositions  provide  the  necessary  tools  for  the  accurate  and  efficient 
solution  of  (2.9)  and  (2.11). 

Proposition  2.1.  Let  E  ~  +  aT^.  Then,  V P  ^  =  diag  |  ,  t  =  1, . . .  n|  where 


V.2. 

•J 


Proof.  We  have  that  P  is  a  nonsingular  diagonal  matrix  and  that  V^V  is  block  diagonal 
where  the  blocks  are  mxm  rank  1  matrices.  Thus  P~^  can  be  calculated  by  the  Sherman- 
Morrison- Woodbury  formula,  (see,  e.g.,  [DenS83,  page  188]  or  [OrtR70,  page  50|)  as 


(2.12) 


=  {V'^V  +  P)"* 

=  p-‘  -  E-^V'^{I  +  VE-^V'^)-^VE-\ 


By  direct  calculation,  VE  =  diag{u;,,  i  =  l,...,m}  with  w,  as  defined  above.  The 
calculation  of  VP~^V'^  follows  directly  from  (2.12). 

Proposition 2.2.  (/-VP~‘V^)‘/^  =  diag |  ,  i  =  1, . . . ,  n  I  where  w,  is  as  above. 

Proof.  The  result  follows  immediately  from  Proposition  2.1. 


l'  X..' 


Propodtion  2.S.  With  ui  as  above 


VP-^  =  dia«|-^,  »■  =  V£7-^ 

1 1  +  w,  J 

=  =  l . n}. 


Proof.  The  results  follow  directly  from  (2.12)  and  Propositions  2.1  and  2.2. 
Proposition  2.4.  The  right  band  side  of  (2.11)  is  given  by 

(I  -  [-Gi  +  VP^^(V^Gi  +  DGa)] 

=  -dia«|  ^  njci-Vh^-^DGa. 


Proof.  We  have  that 


-  Gi  +  VP-^(V^Gi  +  DGi) 

=  -(/  -  Vp-^V^)Gi  +  Vp-^VGi 

=  -diagl— — ,  i  =  l,...,n|  Gt  +  VP'^DGt. 

1. 1  +  w,-  ) 

Now  the  result  follows  immediately  from  Propositions  2.2  and  2.3. 

With  the  above  formulas,  we  can  solve  (2.11)  for  s  by  first  calculating  E  and  u;  in 
0(nm)  operations  and  then  forming 


J  =  diag 


1  + 


-ji/a  'j 

,  i  =  l,...,n  ^ 


y  =  -diag  {(1  +  u;.]'/"  ,  i  =  1, . . .  ,n}  (Gi  -  VE-^DG2). 


Now  rewrite  the  least  squares  problem  (2.11)  as 


(2.13) 


which  requires  0(np  +  nm)  operations.  The  solution  of  (2.13)  then  involves  a  QR  de¬ 
composition  of  J  (accomplished  by  Householder  transformations  with  column  pivoting) 
and  then  a  sequence  of  plane  rotations  to  eliminate  The  cost  is  for  this  phase  is 

dominated  by  the  O(np^)  operations  for  the  QR  decomposition  of  J. 


7 


Using  Proposition  2.3  and  (2.12),  it  is  easily  verified  that 


t  =  -E 


-i 


V^diag  »■  =  1, . . .  ,n|  (Gi  +  -  VE-^DG2)  +  DG2 


which  is  dominated  in  cost  by  the  0(np)  operations  needed  to  form  Js  and  several  0{nm) 
terms.  Thus  the  leading  cost  of  calculating  a  step  for  ODR  is  the  same  O(np^)  operations 
needed  to  do  the  factorization  of  an  n  x  p  matrix  as  in  OLS.  The  only  additional  costs  are 
a  small  number  of  calculations  costing  0(nm)  or  0(np)  operations. 

It  may  occur  to  the  reader  that  an  efficient  QR  factorization  of  the  matrix  in  (2.7) 
might  yield  a  procedure  with  the  same  order  of  work.  By  re-ordering  the  upper  2x2  blocks, 
one  can,  indeed,  do  the  factorization  of  this  part  in  O(np^)  operations.  The  subsequent 
elimination  of  the  aS  and  aT  blocks,  however,  would  require  0((nm  -I-  p)^)  operations 
for  each  a.  It  is  for  this  reason,  as  well  as  others,  that  Schwetlick  and  Tiller  [SchTSS]  do 
only  a'^partial’*  trust  region  strategy,  i.e.,  their  trust  region  only  applies  to  the  step  in  the 
/3  variables.  In  some  badly  scaled  problems,  however,  (e.g..  Example  3  in  Section  4)  the 
ability  to  scale  and  constrain  the  step  in  £  is  essential  to  solve  the  problem. 

The  above  formulas  for  s  and  t  are  used  for  each  a  value  in  (2.5).  Thus  in  order  to 
complete  the  specification  of  the  algorithm,  we  need  to  provide  the  procedure  for  computing 
the  trust  region  parameter  a  to  satisfy  (2.4)  and  to  discuss  a  few  miscellaneous  details. 

Mor^  [Mor77],  following  Hebdon  [Heb73j  (see  also  [DenS83]),  suggested  a  procedure 
for  computing  a  in  (2.5)  so  that 

||s(a)||  «  r 

when  ||s(0)||  >  r.  This  procedure  is  based  on  approximating  the  function 


<f>(a)  =  ||z(a)j|  -T 


by  a  rational  function  ff(a)  =  'yf(p  —  a)  where  the  constants  7  and  /i  are  chosen  by  making 
fl(Q®)  =  ^(a®)  and  tf'(a®)  =  ^'(a®).  This  results  in  the  iteration 


a 


+ 


^'(o*)  r 


which  is  a  modified  Newton  step  for  the  equation  (^(a)  =  0.  In  our  case,  the  derivative 
of  ^(a)  is  not  as  simple  to  compute  as  in  OLS  and  thus  we  opt  to  compute  7  and  /x  by 
making  =  ^(o®)  and  tf(a“)  =  ^(a“).  (a“  is  the  previous  estimate.)  Then 


a ' 


=  a 


^(a®)(a®  -  g-)  (^(a®)  +  r 

^(a®)  -  ^(o-)  r 


which  is  clearly  a  modified  secant  step  for  the  equation  ^(a)  =  0.  Mor^  found  it  neces¬ 
sary  to  safeguard  his  procedure  by  computing  and  updating  upper  and  lower  bounds  on 
a.  Similarly,  we  maintain  such  bounds,  but  with  formulas  appropriate  to  the  secant-like 
method.  These  will  be  provided  in  a  subsequent  paper. 


The  trust  region  bound  r  is  updated  according  well-tested  ideas  which  are  in  several 
existing  jcodes.  (See  e.g.,  [Mor77],  [Gay84],  [SchKW86].)  A  valuable  feature  in  our  code 
has  been  the  “internal  doubling”  step.  For  a  given  rj‘  and  r,  suppose  Zr  is  generated  such 
that  r  restricts  2»  and  the  reduction  in  ||G||  predicted  by  the  linear  approximation  agrees 
with  the  actual  Eduction  in  ||G||  to  a  high  precision.  Normally,  one  would  accept  Zr  , 
set  :=  ti‘  +  Zr  and  double  r  for  the  next  step.  The  internal  doubling  procedure  is  to 
remember  r)’’  :=  17^ + Sr  t  double  r  and  generate  a  new  Zr  from  17^.  Note  that  this  procedure 
only  costs  an  evaluation  of  G  and,  if  successful,  may  save  several  evaluations  of  J  .  In 
practice,  it  has  been  successful  often  enough  to  warrant  leaving  it  in.  Its  main  advantage  is 
that  it  permits  rapid  and  cheap  increases  in  r  based  on  an  overly  conservative  initisd  value 
of  T  or  when  the  iterates  are  moving  away  from  a  highly  nonlinear  region  in  parameter 
space. 

Since  many  users  will  want  to  compare  the  results  of  OLS  with  ODR,  an  option  to  do 
OLS  has  been  implemented.  Enabling  this  option  merely  initializes  the  S  vector  to  zero 
and  sets  V  to  zero  whenever  it  is  computed.  It  is  easily  verified  that,  in  this  case,  (2.11) 
reduces  to  the  OLS  Levenberg-Marquardt  step  and  (2.0)  yields  t  =  0  leaving  6  =  0.  Using 
this  procedure  to  do  OLS,  therefore,  is  equivalent  to  a  standard  OLS  adgorighm  with  a 
moderate  extra  algebraic  overhead. 

S.  Local  and  Global  Convergence  Analysis 

The  global  convergence  properties  of  trust-region-Levenberg-Marquardt  methods  ap¬ 
plied  to  the  general  nonlinear  least  squares  problem  (2.3)  are  well  known  (see  e.g.,  [Pow75], 
[Mor77],  [MorS81],  [ShuSB85]).  As  long  as  the  sequence  of  Jacobian  matrices,  {G'(f7ib)}, 
is  unifoTxnly  boimded,  then 

lim  G'(rik)^G{r}k)  =  0, 

k^oo 

so  that  any  cliister  point  satisfies  the  first  order  necessary  conditions  for  a  local  minimizer. 
These  results  apply  to  our  algorithm  and  nothing  more  needs  to  be  said  regarding  global 
convergence. 

The  local  convergence  behavior  of  general  trust-region-Levenberg-Marquardt  methods 
for  nonlinear  least  squares  is  discussed  by  Byrd  and  Schnabel  [ByrS85]  who  show  that,  if 
there  is  a  Cluster  point  t},  where  G'{r),)  is  nonsingular,  then  the  iterates  converge  at 
least  linearly  to  ri,  independent  of  the  size  of  G(r7,).  This  theory  also  applies  to  our 
algorithms.  If,  in  addition,  the  residual  G(r7.)  is  sufficiently  small,  Byrd  and  Schnabel  show 
that  asymptotically  the  triist  region  constraint  becomes  inactive,  and  that  the  Levenberg- 
Marquardt  algorithm  reduces  to  the  Gauss-Newton  iteration 

»7*+i  =r}k-  [G'(T7fc)^G'(»7fc)]~^  Gf'(f7fc)^G(i7fc) 

and  is  linearly  convergent  to  >7,.  The  linear  convergence  analysis  of  the  Gauss-Newton 
method  is  well  known  (see  e.g.,  [OrtR70],  [DenS83]).  The  constant  of  linear  convergence 
,  depends  upon  the  smallest  singular  value  of  G'(r7,),  the  residual  G(r7,) ,  and  the  nonlinearity 
of  G(f7)  near  ri,. 

The  small  residual  analysis  is  particularly  relevent  to  ODR  because  most  applications 
of  ODR  will  have  small  residuals.  This  is  especially  true  when  ODR  is  used  to  consider 


*%  >■., 


••  * 


f  • 


•  V* 


0 


errors  in  independent  variables  in  parameter  estimation,  because  errors  in  the  independent 
variables  are  most  likely  to  be  considered  when  the  model  and  the  dependent  variable 
measurements  are  accurate,  which  implies  that  the  residuals  will  be  small. 

It  turns  out  that  the  application  of  the  local  Gauss-Newton  analysis  to  ODR  is  non¬ 
trivial,  although  the  expected  results  can  be  proven.  This  is  the  Tn».in  contribution  of  this 
section.  To  simplify  the  algebra  here,  we  consider  a  version  of  the  ODR  problem  (2.1)  with 
the  simplified  weighting  scheme  Wi  =  1  and  d«  =  o  for  all  »,  i.e., 

(3.1)  ^  ^  [(/(x,  +  Sii  0)  -  y.)  2  i,] 

where  o  >  0.  This  weighting  still  allows  the  metric  of  distance  from  the  curve  /(x;  0)  to  the 
data  points  (2t,yi)  to  vary  from  vertical  (as  rr  — »  oo)  to  orthogonal  (cr  =  1)  to  horizontal 
(as  a  — »  0).  (We  explain  this  statement  more  carefully  later  in  this  section.)  This  is  all 
the  generality  in  the  weighting  that  is  usually  used  in  practice,  and  precisely  what  we  use 
in  most  of  our  computational  results  in  Section  4. 

In  fact,  as  we  illustrate  in  Section  4,  in  practical  ODR  applications,  the  user  may  wish 
to  solve  (3.1)  for  various  values  of  o.  A  second  contribution  of  this  section  is  to  produce  a 
bound  on  the  constant  of  linear  convergence  the  Gauss-Newton  method  applied  to  (3.1) 
that  is  independent  of  the  value  of  o. 

To  further  simplify  notation,  we  rewrite  (3.1)  as 

(3.2a)  min  R(tj)^R(ij)  +  0*6^5 

or  equivalently, 

(3.26)  mmG(ff)^G(rf) 


where  6  =  (fif,  6,^,. . .  r,  =  (0^,  S^)^,  Jl(ff),  =  /(x,  +  «.;  ^)  -  y.,  i  =  1 . n,  and 

G(ff)  =  {R(ri)^ ,cS^)^ ,  Our  analysis  will  not  depend  upon  the  special  form  of  Jl(fr)  in  any 
way.  Recall  that 

y) 


where  J(tj)  and  V (ij)  are,  as  in  Section  2,  the  derivatives  of  Jl(rf)  with  respect  to  0  and  6 
respectively. 

The  difficulty  in  applying  standard  Gauss-Newton  analysis  to  (3.2)  is  that  G(ff)  and 
G'[ri)  are  functions  of  a.  In  Theorem  3.5  we  show  that  the  convergence  can  be  analyzed 
in  terms  of  the  properties  of  J(»?),  V'(rj),  R(t)*),  and  6,  only,  i.e.,  independent  of  a  except 
for  its  role  in  determining  f/,.  Lemmas  3.3  and  3.4  are  \ised  in  the  proof  of  Theorem  3.5  to 
bound  the  linear  and  quadratic  terms  in  the  convergence  results,  respectively,  independent 
of  a.  Lemmas  3.1  and  3.2  are  technical  results  used  in  the  proof  of  Lemma  3.3.  Theorem 
3.5  can  be  applied  to  the  more  generally  weighted  problem  (2.1)  by  an  appropriate 

change  of  variables. 


Lemma  8.1.  LetAe  C  6  B  €  and  let 


be  symmetric  and  positive  definite.  Tfien  ||^||  <  ||A||  +  ||C||. 

Proof:  Define  o  =  llAll,c  =  llCH-  For  any  v  e  R’*  and  iv  e  R^,  consider 

■  vlliwllv^ 

y  = 

Then,  since  H  is  positive  definite, 

O  <  y'^Hy  =  v^Av^wW^e  -  2v'^ Bw\\w\\\\v\\i/M  +  w^Cv>\\v\\^a 


and  therefore 


«  T«  ^  v^Avllwlpc  +  o 

- iMilMWS 


||e||^o||wf  c  +  lltelpc  ||ef  a 

||t£;||||t;|iv/Sc 

=  2llvlllltwll^/5^ 

<  l(vf  c  + 


Thus 


llfl'll  =  max 
"  '  w€il« 


v'^Av  +  2v^Bw  +  w'^Cv) 
v'^v  +  w'^w 

l|v||^a  +  (llvll^c  +  ||«^ll’*o)  +  Ikll^g 


<  mRTf  ^ - = - - - 

“  we*',  »€««  v^v  +  rv^w 

=  o  +  c. 

i^wnnt»  J.2,  Let  A  e  have  fall  column  rank  and  B  €  iZ"^”  be  positive  definite. 

Then  ||(A^B-iA)-MI  <  ||(ArA)-MII|5||. 

Proof:  y 

Lemma  8.8.  Let  J  e  R’^^’’  have  /uii  column  rank,  V  G  R’*^’’,  I  the  q  x  q  identity,  and 
a  a  positive  scalar.  Let  M{a)  “  <t/^  ’  ~  M{c)^ M{o).  Then  N{a)  is 

nonsingular  and 

(3,3)  1|W(»)-MI  <  1I(J’'J)'‘II  +  (1  +  IK-'’’-')"'!!  Il''ll') • 


Proof;  Since  J  has  full  column  rank,  Af(cr)  has  full  column  rank.  Thus  JV(<t)  is  positive 
definite,  and  it  is  straight  forward  to  verify  that  N{a)~^  ~  c)  ’ 

C=  [a^J  +  V'^  {I  -  J{J^J)-^J^)V]~\ 

B  =  -{j'^jy^j'^vc. 

From  Lemma  3.2, 

Mil  < l|V’’V+<,“/|| 

(3-4)  =<’-'ll(j’'j)-'ll(IH'’-V||  +  <r3) 

=  »-’||(J^J)-‘||  ||Vf +  ||(J’-J)-'||. 

Since  C~^  -  a^I  =  [/  -  J(J^  J)~  V  is  positive  semi-definite,  the  smallest  eigen¬ 

value  of  C~*  is  at  least  0“^  which  shows 

(3.5)  lldl  < 

Thus,  since  is  positive  definite,  applying  Lemma  3.1  and  using  (3.4)  and  (3.5) 

gives  (3.3). 

Lemma  3.4.  Let  the  assumptions  of  Lemma  3.3  bold,  and  let  ue  R'*.  Lets  be  the  solution 
to 


<’■«>  Ilfo 

II  \  0  /  \®/|  ' 

J)~^  and  let  V"*"  denote  the  pseudoinverse  of  V.  Then 

N<[ll'''"ll +  11^+11  (I +  Iiv||  iiv+ii)]  ||«||. 

Proof:  Define  y  €  Rp^'>  by 

_  fl  -J+V\ 

"  \0  / 

and  let  r  =  (  22  ^  ^  ~  ( 1/2  )  G  i2^,and  22,  y2  6  R^.  Then  (3.6)  is  equivalent 


mm 

y€fi«’+ 


where  P  —  J  )  is  a  projection  matrix.  From  the  normal  equations  using  J^p  =  0, 
the  solution  to  (3.7)  is 

yi  =  J+u 

y2  =  {V^PV  +  o^I)-^V’^Pu. 


'XT  -,T*.-l 


It  is  well  known,  for  instance  from  the  theory  of  Levenberg-Marquardt  methods,  that 

IIkII<||(pv)+u||, 

and  tinea  (PV)*  =  V+P  and  HPH  <  1, 

IIotII  <  llv+ll  INI.  Hull  <  ||J+||  IM- 
Finally  since  ta  =  ya  and  ti  =  yi  —  J^Vy^, 

ll^tll  <  ll‘'*ll  ll«ll 

and 

l|till<ll»ill  +  lk'"ll  IIJ'll  llinll 
:£||-''"ll  (i  +  ll»'il||»''"ll)ll«ll- 

Using  ||s||  <  llzill  +  gives  the  desired  result. 

Throughout  the  statement  and  proof  of  Theorem  3.5,  we  will  often  omit  the  argument 
t/;  i.e.,  we  will  denote  G{rit)  and  G{rio)  by  G,  and  Gq,  respectively,  and  likewise  for  other 
symbols  in  place  of  G.  Also  for  J  having  full  column  rank,  J'*'  will  denote  (J^  J)~^  J^,  and 
for  V  having  full  row  rank,  will  denote  Note  that  . 

Theorem  3.5.  Let  R(r})  :  R*  — »  R'*  be  continuously  differentiable  in  an  open  convex  set 
D  C  R*.  Let  =  (^*’,6^),  0  €  J2**,  S  €  R^,  let  a  be  a  positive  scalar,  and  Jet  G(q)  = 

there  exists  rj*  €  D  such  that  G'(rj)^G(rf*)  =  0,  and  that  there  exists 

7  >  0  for  which 


for  all  Tj  €  D.  De&ne 


=  II«.|I  +  (i  +  II(J.’'a)-‘||  IIv.II’)  ||v.+||  ||«.||] 

=  (1/2)  [lin+ll  +  (1  +  imii  ||F+||)] . 

if  Cl  <  1,  then  for  any  c  €  (1,1/ci) ,  there  exists  c  >  0  such  that  for  all  tfo  for  which 
||»7o  -  f?*ll  <  the  sequence  generated  by  the  Gauss-Newton  method 

T/k+i  =  r}k-  iG'(rik}^G'(q^))~^  G'{rik)^G(qk) 
is  well  defined,  converges  to  q,  and  obeys 
(3.9)  Ikfc+i -q.ll  <c(ci+C2||»7fc-f7,||)||»jfc-f7,||. 


Proof:  The  crucial  remaining  part  of  the  proof  is  to  note  that  since  the  optimality  condition 
=  0  gives  4-  =  0,  we  have 


R>  =  -a^{V,VT)  ^VTS, 

so  that 

Thus  from  (3.3)  of  Lemma  3.2, 


(3.10) 


<  ||(A’-J.)-‘||  ll«.|i  +  (i  +  ||(J.’-.i.)-||  ||v.f )  i|ji.y 

s|i(-'?’-'.)'‘jj  ll*.ll+(l  +  ||(.l.’'A)-'||||V'.f)||V.+  ||  ||«.|| 


The  remainder  of  the  proof  is  by  induction.  Let  c  be  a  fixed  constant  in  (1, 1/ci) . 
Then  there  exists  ci  >  0  such  that  for  all  ri  for  which  ||»?  -  »j,||  <  cj,  J(i7)andV(f7)  have 
full  column  and  row  rank,  respectively,  and 


(3.11) 

and 


-1 


<  C 


((G'.)’'G'.) 


-1 


(3  12)  ll^(»?rll  +  ll-^(»7)^||(l  +  r(»7)ll  |iV(»7)+|l) 

Let  c  =  min{«i,  (1  -  cci)  /  (2cc2)}  .  Then  at  the  first  iteration  ((G[,)^G(,)  is  nonsingular, 


(3.13) 


Vi  V*  —  Vo  ~  V»  ~  (f»o)^^o 

=  a  +  b 


where 

«  =  -((gj)1’c|,)-‘(g;)^g. 

^  ~  ((^o)^G'o)  (Gq)^  (G,  —  Go  —  Go(r7,  —  rjo))  • 

Since  (G',)^G,  =  0, 

(314)  {G'ofG,  =  {G'o  -  G:fG.  =  (R{,  -  R'^R, 


due  to  the  linearity  of  the  final  q  components  of  Using  (3.14)  plus  (3.8),  (3.10),  and 
(3.11),  we  have 


(3.15) 


ii»ii  <  ||((«Jo)’'G'o)  ‘II  ii«.ii 

<«-r||((G'.)’'G;)'‘|  ll*.tl  ll•?o-1.| 

<  cei  11*70  -»?.  II  • 


Also  note  that  b  is  the  solution  to 


and  that 


(3.16) 


ndn  ||Cjo6  —  [G,  —  Gq  —  Gq  (»7*  —  t;o)]1|  , 


From  (3.8)  and  standard  results 


(3.17) 


lliZ.  -Rc-Ri^iv,-  *70)11  <  h/2)  Iluo  -  *7.f 


Thus  using  (3.16),  (3.17),  Lemma  3.4,  and  (3.12), 

(3.18)  -  I"*'”'"" 

<CC3  ||l)o-IJ.||’. 

Substituting  (3.15)  and  (3.18)  into  (3.13)  and  recalling  ||;7o  —  *7*||  <  (1  —  eci)/2cc2, 

||*7i  -  *7.||  <  c  (ci  +  C2  ||»7o  -  *7.||)  11*70  -  *7.|| 

<  [(l  +  cci)/2]  11^0- *7. II 

which  proves  (3.9)  in  the  case  /:  =  0  and  shows  that  |l;7i  —  »7,|1  <  \\rjo  -  r7,l| .  The  proof  of 
the  induction  step  is  identical. 

Byrd  and  Schnabel  [ByrS85]  show  that  the  trust-region-Levenberg-Marquardt  algo¬ 
rithm  described  in  Section  2  will  reduce  asymptotically  to  Gauss-Newton  if  the  constant 
of  linear  convergence,  ci,  in  Theorem  3.5  is  sufiiciently  small.  In  particular,  our  computer 
code  will  accept  the  trial  step  if  the  ratio  of  actual  to  “predicted”  reduction, 

||G(>7c  +  Zr)||-||g(*7c)|| 

||G(*7c)  +  G'(,7c)^^||-||G(qc)|| 

is  at  least  0.001.  In  this  case,  as  long  as  ci  <  .9999,  the  trust  region  in  the  Levenberg- 
Marquardt  algorithm  is  inactive  asymptotically,  so  that  the  algorithm  becomes  Gauss- 
Newton. 


»»-  </■/-  f.  C.  T>A .  1-Vrf.  I*. .~-V-  «•- .a- -V 
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Now  we  consider  the  behavior  of  the  ODR  problem  (3.2)  as  the  parameter  a  is  varied. 
For  this  purpose,  let  us  denote  the  global  minimizer  to  (3.2)  by  17* (a).  Then  by  standard 
analyses  of  barrier  ftmction  methods,  (see  e.g.,  [FiaM68]  or  [Lue73])  we  know  that  the  limit 
of  n*(<7)  as  O'  —»  00  is  the  solution  to 

min  ||R(i7)|f^  subject  to  6  =  0, 

•? 

i.e.,  the  standard  OLS  problem 

(3.19)  min|li2(^,0)f . 

P 

Similarly,  the  limit  of  ri, (a)  as  o  — »  0  is  the  solution  to  the  implicit  least  squares  (ILS) 
problem 

(3.20)  min||j||^  subject  to  R{rt)  =  0. 

n 

In  the  data  fitting  context  where  R(f/)i  =  /(i*  +  0)  —  yi,  (3.19)  is  the  standard  problem 

where  the  independent  variables  n  are  assumed  exact  so  that  the  metric  of  distance  is  in 
the  y  (vertical)  direction  only.  In  constast  (3.20)  is  the  case  where  the  dependent  variables 
y,  are  assumed  exact  and  the  independent  variables  z,  inexact,  so  that  the  metric  is  entirely 
in  the  z  (horizontal)  direction. 

The  standard  analysis  of  barrier  function  methods  also  shows  that  ||R(i7,((t))||  is  a 
monotonically  increasing  function  of  o,  and  that  p.(o)|{  is  a  monotonically  decreasing 
function  of  a.  This  means  that  for  all  <t  €  (0,  00),  the  values  of  ||i?(f7*((T))||  and  ||^*(<t)|| 
are  bounded  above  by  the  optimal  objective  function  values  for  problems  (3.19)  and  (3.20), 
respectively.  In  data  fitting  terms,  for  any  c,  the  norm  of  the  optimal  vertical  residuals 
in  ODR  is  bounded  above  by  the  norm  of  the  optimal  residuals  in  OLS,  and  the  norm 
of  the  optimal  horizontal  residuals  in  ODR  is  bounded  above  by  the  norm  of  the  optimal 
residual  for  the  ILS  problem.  The  computational  results  of  Section  4  demonstrate  these 
relationships. 

Combining  the  above  facts  with  Theorem  3.5  shows  that,  if  the  optimal  objective 
function  values  for  problems  (3.19)  and  (3.20)  are  sufficiently  small,  and  if  J{r},{a))  and 
V(T],{a))  are  sufficiently  well-conditioned  for  all  a  €  (0,  00),  then  the  Gauss-Newton 
algorithm  applied  to  (3.2)  is  linearly  convergent  for  any  a  €  (0,  00). 

Corollary  3.6.  Let  rj,  P,  6,  R{t]),  G(t/),  J{r}),  and  V(t})  be  deSned  as  in  Theorem  3.5. 
For  any  o  €  (0,  00),  let  Tj,(a)  =  ,  5,(<y)^)^  denote  the  global  solution  to 

(3.21)  minl|R(/3,  S)f  +  a^\\S\\\ 

Pi  t 

Also  let  fdoLS  denote  the  global  solution  to  the  ordinary  least  squares  problem 

minllR(i3,  0)f 
P 
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and  let  {0iLSi  ^ils)  denote  the  globe!  solutions  to  the  implicit  least  aquares  problem 

xnin||£||^  subject  to  R(/3,  S)  =  0. 

fit  ^ 

Let  Rols  =  R{fioLs)'  Assume  that  there  exist  i  >  0,  ■y  >  0  such  that  for  each  a  G  (0,  oo), 

ll-R'C*?)  -  ■«'(»?.  W)ll  <  7  Ik  -  *7*(‘^)ll 

For  all  ri  for  which  ||i/  —  i7*(<y)l|  <  i.  Assume  also  that  for  all  o  e  (0,  oo),  J(ri,{ff))  and 
V{rj,{a))  have  full  column  and  row  rank,  respectively,  and  let  J,  J+,  V,  and  be  uni¬ 
form  bounds  on  |1  J(i?.(a))ll ,  l|d(»7,(<T))+l| .  llV(q,(o))ll ,  and  |lV(ij,(a))+l| ,  respectiveiy, 
over  all  o  €  (0,  oo).  De&ne 

Cl  =  y  i2oL5  +  (l  +  VSiLs 

=  (7/2)  [v+  +  J+  (1  +  VV+)]  . 

Ifci<l,  then  for  any  e  €  (1,  1/ci),  there  exists  i  >  0  such  that  for  any  a  €  (0,  00),  the 
sequence  {q*}  generated  by  the  Gauss-Newton  method  applied  to  (3.2)  starting  from  any 
f7o  for  which  ||i7o  —  »/*(^)||  <  i  is  well-de&ned,  converges  to  ri,(a),  and  obeys 

||»7fc+i  -  »7.(<y)||  <  c  [ci  +  C2  |l»7fc  -  Ikfc  -  • 

4.  Computational  Teating  ^ 

In  this  section  we  report  the  results  of  preliminary  computational  testing.  These 
tests,  consisting  of  two  contrived  problems  and  one  real  problem,  were  selected  in  order  to 
illustrate  the  effectiveness  of  the  implementation  and  to  demonstrate  the  performance  of 
the  basic  algorithm.  They  also  allow  us  to  contrast  ODR  and  OLS,  which  can  have  rather 
dramatic  differences,  and  to  point  out  some  of  the  inherent  difficulties  in  ODR  problems. 

The  contrast  between  OLS  and  ODR  is  best  brought  out  in  terms  of  the  parameter 
(7  and  the  function  0{o)  from  Section  3.  (Recall  that  /9(oo)  corresponds  to  the  OLS 
solution.)  Since,  in  practice,  the  correct  value  of  a  may  not  be  known  exactly,  it  is  of 
interest  to  compute  /?((?)  for  various  values  of  <7. 

The  algorithm  was  coded  in  Fortran  77  and  r\m  in  double  precision  on  the  Perkin- 
Elmer  3230  at  the  National  Bureau  of  Standards  (NBS).  The  graphics  were  done  on  the 
Evans  and  Sutherland  PS-300,  also  located  at  NBS. 

Example  1. 

Consider 

1 

and  define  ij  =  .01  +  (»  —  1)  •  .05, 1  =  1, ...  ,40.  Next  let 


Now  we  perturb  the  data  points  as  follows: 

Xi  :=  Xi  +  rx 
Vi  ■=  Vi  +  ry 

where  the  rx  are  uniformly  distributed  on  (—.05,  .05)  and  the  ry  are  uniformly  distributed 
on  (—.25,  .25).  The  model  for  the  data  was  taken  to  be 

X-A 

and  the  ODR  program  was  run  with  several  values  of  a.  The  results  are  reported  in 
two  tables  and  three  graphs.  Table  1  was  generated  by  setting  <7  =  1  and  taking  0^  = 
(1,1)^.  Subsequent  solutions  for  higher  values  of  a  used  the  previous  solution  for  the 
initial  approximation.  In  addition  to  the  values  of  /9(<7),  Table  1  contains  the  number  of 
evaluations  of  the  extended  residual  function  G  (cf  (2.3))  and  its  Jacobian,  and  the  optimal 
values  of  ||R(i7(<r))||  and  ||^(<7))||  for  each  value  of  <7.  Since  the  value  of  6  was  expected  to 
be  approximately  the  size  of  the  variance  of  the  errors,  we  set  the  weight  T  =  10.  Table 
2  is  organized  just  as  Table  1,  but  the  results  were  generated  by  starting  with  the  OLS 
solution  using  =  (1,  l)^  and  then  decreasing  er.  The  graphs  are  as  follows:  Figure  5 
corresponds  to  the  <7  =  2  fit;  Figure  6  corresponds  to  the  OLS  fit  from  Table  1;  and  Figure 
7  to  the  OLS  fit  from  Table  2.  Note  that  on  all  of  these  plots,  the  y-axis  has  been  scaled 
by  a  factor  of  approximately  100  in  order  to  get  all  of  the  points  on  the  plot.  Because  of 
this  scaling,  the  error  pegs  (connecting  the  data  points  to  their  predicted  values)  appear 
to  be  much  more  horizontal  than  they  really  are. 

Obviously,  Tables  1  and  2  exhibit  a  nonuniqueness  of  the  solutions.  It  appears  that 
there  are  two  local  solutions  for  the  OLS  problem  corresponding  to  the  asymptote  to  +oo 
being  on  the  left  or  right  half  of  the  curve,  and  that  the  trajectories  emanating  from  these 
solutions  come  together  around  a  =  600  or  that  the  triyectory  represented  in  Table  2 
fails  to  be  continuoxis  near  <7  =  600.  A  possible  means  of  investigating  this  phenomenon 
is  to  write  the  differential  equation  describing  the  trigectory  and  to  study  possible 
bifurcation  points.  This  is  not  pursued  here. 

Observe  that  02  determines  the  location  of  the  asymptote  and  thus  the  data  locate  this 
parameter  very  well.  Note,  however,  that  the  data  point  near  the  asymptote,  corresponding 
to  (1.01, 100)^,  completely  dominates  the  fitting  process  for  OLS  in  Table  2  and  results  in 
a  value  of  -.3180  for  0i.  The  ODR  fit  is  not  nearly  so  influenced  by  this  data  point  and,  for 
a  broad  range  of  a,  does  a  very  good  job  of  fitting  the  data.  This  last  point  is  important, 
namely  that  the  parameter  values  do  not  vary  much  as  a  function  of  a,  which  means  that 
<7  may  not  need  to  be  known  with  much  accuracy.  The  stability  of  ^(<7)  has  been  noticed 
on  all  of  our  examples  and  on  problems  not  reported  here.  This  is  not,  of  course,  a  proof 
that  this  phenomenon  holds  more  generally. 

A  further  difference  between  the  OLS  and  the  ODR  fits  is  that  the  errors  for  both 
the  the  OLS  fits  do  not  appear  to  be  random.  Almost  all  of  the  errors  to  the  left  of  the 
asymptote  are  negative  while  all  to  the  right  are  positive.  The  ODR  errors  for  reasonable 
values  of  a  appear  to  be  much  more  random. 
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An  examination  of  the  computational  results  reveals  that  the  only  hard  optimization 
problem  in  each  set  is  the  first.  Subsequent  solutions  are  found  very  quickly  except, 
of  course,  for  the  problem  corresponding  to  ^  =  500  in  Table  2  which  appears  to  have 
jumped  across  a  discontinuity  in  l3{o).  A  detailed  examination  of  the  iteration  process 
shows  that  the  algorithm  sometimes  slows  down  (a  very  small  value  of  r  is  generated)  but 
then  recovers  and  final  convergence  is  with  full  Gauss-Newton  steps.  For  the  case  a  =  500, 
fairly  large  steps  in  6  were  generated  which  led  to  apparent  convergence  with  very  poor  ^ 
values  (near  (0,  l))  and  very  large  values  of  6  (G(l)).  In  this  case,  a  very  small  value  of  r 
was  produced.  When  the  procedure  was  restarted  with  a  large  value  of  r,  the  algorithm 
immediately  stepped  over  this  bad  region  and  converged  quickly  to  the  correct  solution. 
Thus,  it  appears  to  be  important  to  scale  the  step  in  6  correctly  and  to  be  on  the  lookout 
for  unrealistic  solutions. 

Example  2. 

This  exzunple  is  a  two  dimensional  version  of  Example  1.  Here  we  take  z  €  R?  and 

1 

y  = - . 

Xx  +  Xl-l 

This  function  has  a  line  of  singularities  along  Zi  +  Z2  =  1.  We  take  the  data  to  be  on 
the  rectangular  grid  of  width  .1  in  the  zj  direction  and  width  .2  in  the  Z2  direction.  The 
first  point  is  (.01,  .01)^  and  there  are  10  points  in  the  zj  direction  and  5  points  in  the  Z2 
direction,  y  is  the  evaluated  at  these  points  and  the  data  are  then  perturbed  according  to 
the  following: 

(zi),-  :=  (zi),-  +rz 
(Z2).  :=  (z2).-  +  rz 
y.  ;=  y.  +  ry 

where  rz  are  normally  distributed  with  mean  0  and  8tand2U'd  deviation  .01  and  the  ry  are 
distributed  normally  with  mean  0  and  standeird  deviation  .04. 

The  form  of  the  model  is 


^  ^2X1+^312-1 

The  results  are  given  in  Table  3  and  in  Figures  8-12.  Table  3  is  organized  as  Table  1 
and  Figures  8-12  correspond  to  the  values  of  a  noted.  The  graphs  are  compressed  by  a 
factor  of  approximately  80  in  the  y  direction  in  order  to  get  all  of  the  data  points  on  the 
picture.  Again  we  observe  that  the  values  of  /?(<?)  do  not  vary  quickly  and  that  the  fits 
depend  more  and  more  on  the  points  near  the  asymptote  as  a  increases.  Note  that  the 
location  of  the  asymptote  is  well-determined  by  the  data  and  that  only  Pi  changes  much  as 
o  increases.  Here  the  insistence  on  near  vertical  measures  of  the  error  forces  Pi  to  assume 
smaller  values  which  has  the  effect  of  flattening  the  function  as  much  as  possible  near  the 
asymptote.  This,  of  course,  tends  to  minimize  the  vertical  component  of  the  error.  As  in 
Example  1,  the  errors  for  the  OLS  fit  do  not  appear  random  while  those  for  the  ODR  fits 
do. 
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Note  that  the  first  solution,  corresponding  to  o  =  1,  was  computed  with  some  diffi¬ 
culty.  (This  IS  the  same  situation  as  occured  for  o  =  500  from  Table  2.)  In  these  cases, 
the  terram  in  pwameter  space  {0  and  tf)  appears  rather  flat  and  fairly  large  values  of  6 
were  again  obtained  on  intermediate  iterations.  The  iteration  stalled  with  an  indication 
of  convergence  due  to  x-convergeoce  and  a  very  small  value  of  the  trust  region  radius  A 
r«tart  (which  resets  the  trust  region  radius  to  a  larger  value)  then  allows  the  iterates  to 
step  over  this  flat  area  and  converge  very  quickly  to  the  correct  answer. 

The  non-uniqueness  observed  in  Example  1  was  again  observed  here.  The  details  are 
not  repo^d,  but  we  found  a  second  OLS  solution  which  led  to  a  trajectory  of  solutions 
that  finally  joined  the  above  triyectory  at  o  =  2. 

Example  S. 


The  data  here  are  actual  measurements  from  a  calibration  run  on  an  electronic  device 
which  was  mtended  to  give  a  flat  response  over  a  wide  range  of  frequencies.  In  the 
data,  the  z-values  are  in  units  of  frequency  squared  and  the  y-data  are  the  gain.  The 
z-valuM  we  scaled  to  the  interval  (0, 1)  with  several  measurements  made  in  each  decade 
from  10  to  1.  More  measurements  were  taken  at  the  higher  frequencies  since  most  of 
the  important  mformation  is  obtained  there.  The  data  are  plotted  in  Figures  13  and  14 
with  a  log  scale  on  the  z-axis  in  order  to  see  the  situation  better.  The  y-scale  on  these 
graplw  has  been  magnified  to  accentuate  the  differences.  The  peak  at  the  right  side  of  the 
data  IB  at  1.001  while  the  low  point  at  the  extreme  right  end  is  at  .9473.  The  flat  part  at 
the  left  side  is  all  near  .9882. 

The  model  for  this  data  was  obtained  from  theoretical  considerations  and  has  the  form 


where  the  parameters  to  be  determined  are 


+  M, 


t  =  1,...,44 


0  -  (oi,..., 04,^,711  •••,74)^- 

Estimates  of  the  pole  locations— the  negative  7-value8-are  likewise  obtained  from  other 
analyses.  The  7-values  are  approximately 

71  =  1.38  X  10“® 

72  =  5.96  X  10“* 

73  =  6.71  X  10' 

74  =  1.07  X  10®. 

Since  all  of  the  poles  are  negative  and  all  of  the  data  have  positive  x-values,  there 
IS  no  problem  with  being  close  to  the  asymptotes.  The  range  of  the  z-values,  however, 

impliM  the  need  to  scale  the  trust  region.  We  used,  for  the  diagonal  scaling  matrices  S 
and  r,  the  following: 

1 


mi\ 
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It  turns  out  that  the  measurements  are  proportionately  more  accurate  at  the  lower  fire- 
quencies  and  we  therefore  took  the  d-weights  to  be  the  same  as  the  t-weights. 

While  the  data  were  measured  quite  accurately,  there  were  simply  no  data  at  a  suffi¬ 
ciently  high  frequency  to  warrant  keeping  the  two  terms  corresponding  to  j  =  3  and  j  =  A 
in  the  model.  This  situation  was  evidenced  by  the  fact  that  the  Jacobian  J  had  five  almost 
identical  columns. 

With  these  terms  removed,  the  resulting  problem  was  easily  solved  as  follows.  Using  a 
feature  of  the  program  which  allows  certain  parameters  to  be  held  fixed  at  specified  values, 
we  fixed  the  pole  values  (the  'y-values)  and  used  an  OLS  estimate  of  the  remaining  linear 
parameters.  We  then  freed  all  of  the  parameters  and  did  an  OLS  fit  and  and  ODR  fits 
with  several  values  of  o.  In  doing  the  ODR  fits,  we  first  specified  a  o-value  of  .01  since 
the  gain  measurements  in  this  data  set  were  100  times  more  accurate  than  the  frequency 
measurements.  Other  values  of  a  were  subsequently  used  for  comparison.  The  results  are 
in  Table  4  and  Figures  13  and  14  which  depict  the  OLS  fit  and  the  ODR  fit  for  a  =  .01, 
respectively.  Virtually  no  difference  appears  between  the  two  at  the  lower  frequencies,  but 
some  differences  occur  at  the  higher  frequencies.  In  the  enlargements  (Figures  15  and  16) 
one  can  easily  see  that  the  contribution  of  the  error  in  the  x-values  causes  ODR  to  get 
a  significantly  better  fit  than  OLS.  While  the  )9-values  are  not  reported  here,  there  were, 
again,  very  slow  changes  in  0{c). 

In  this  section  we  have  shown  that  our  algorithm  is  effective  on  highly  nonlinear 
problems,  but  that  these  problems  themselves  often  have  multiple  solutions  and  other 
difficulties  which  imply  that  potential  solutions  need  to  be  studied  carefully.  In  subsequent 
papers,  we  will  provide  a  more  complete  description  of  our  implementation  and  further 
results  on  its  performance. 
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TabU  1 


c 

fiiio) 

02io) 

1 

1.023 

1.006 

2 

1.021 

1.005 

5 

1.015 

1.004 

25 

0.0847 

1.002 

100 

0.0247 

0.0072 

300 

0.0881 

0.0028 

500 

0.0487 

0.0053 

1000 

0.8248 

0.0037 

OOi 

0.6867 

0.0000 

a 

0i{o) 

00] 

-0.3170 

1.010 

1000 

-0.3355 

1.005 

700 

-0.3845 

1.003 

500* 

0.0487 

0.0053 

Evaluations 

of 

Final  Value  of 

G 

G> 

l|i2(»7H)ll 

\m\\ 

70 

25 

0.223 

0.355 

6 

5 

0.454 

0.223 

6 

4 

0.771 

0.128 

6 

5 

1.280 

0.080 

0 

8 

3.204 

0.061 

13 

12 

10.408 

0.035 

12 

11 

15.524 

0.018 

10 

0 

18.881 

0.007 

7 

6 

21.774 

0. 

Table  2 

Evaluations 

of 

Final  Value  of 

G 

& 

P(»7W)11 

40 

22 

104.700 

0. 

20 

15 

104.223 

0.007 

27 

21 

103.660 

0.015 

103 

43 

15.524 

0.018 

Table  3 

a 

M<^) 

02{c) 

V 

0.8988 

0.0482 

1.015 

2 

0.0223 

0.9478 

1.019 

4 

0.9345 

0.0506 

1.027 

10 

0.0049 

0.0510 

1.047 

40 

0.7148 

0.9568 

1.044 

100 

0.3645 

0.0343 

0.0804 

500 

0.0014 

0.8830 

0.9675 

00 

0.1102 

0.8883 

0.0338 

Evaluations  of 

a 

G 

G' 

00 

10 

0 

.01 

18 

7 

.1 

12 

7 

1. 

5 

4 

Evaluations  of 

Final  Value  of 

G 

G' 

ll«(»lW)ll  ll«(«r)|| 

147 

60 

0.184  0.670 

7 

6 

0.428  0.618 

8 

7 

0.980  0.540 

9 

8 

2.370  0.429 

10 

0 

6.411  0.315 
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